289 research outputs found

    Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

    Full text link
    Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral (non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems

    On Comparison of Adaptive Regularization Methods

    Get PDF
    This paper investigates recently suggested adaptive regularization schemes

    O USO DE FEEDBACK DO SUPERVISOR E FEEDBACK AFIXADO PUBLICAMENTE PARA AUMENTAR A SEGURANÇA EM UM AMBIENTE DE FÁBRICA

    Get PDF
    The effects of safety-related and behaviorally relevant verbal and posted feedback from supervisors in a manufacturing plant were evaluated using a multiple baseline across behaviors design. During baseline, plant safety averaged 35.3% for the behaviors and conditions on Checklist 1, and 35.0% for the behaviors and conditions on Checklist 2. When verbal supervisory feedback was implemented, the plant safety average increased to 50.6% for Checklist 1, and 75.7% for Checklist 2. When posted supervisory feedback was added to the intervention package, the plant safety average further increased to 58.0% for Checklist 1, and 83.3% for Checklist 2. These results are consistent with previous findings that performance feedback can increase critical work behaviors.Key words: health, injury prevention, behavior-based safety, feedback, accidents.O estudo empregou um delineamento de linha de base entre comportamentos para avaliar os efeitos de feedback do supervisor sobre comportamentos relevantes, relacionados com segurança, em uma fábrica de manufaturas. Durante a linha de base, a segurança do depósito atingiu uma média de 35,3% para os comportamentos e condições do Checklist 1, e 35,0% para comportamentos e condições do Checklist 2. Quando o feedback verbal do supervisor foi implementado, a média da segurança do depósito aumentou para 50,6% para o Checklist 1 e para 75,7% para o Checklist 2. Quando o feedback do supervisor passou a ser afixado como parte do programa de intervenção, a média da segurança do trabalho subiu ainda mais, chegando a 58,0% para a Checklist 1 e 83,3% para a Checklist 2. Esses resultados são consistentes com descobertas anteriores de que o feedback sobre o desempenho pode aumentar comportamentos críticos na situação de trabalho. Palavras-chave: saúde, prevenção de danos físicos, segurança baseada no comportamento, feedback, acidentes

    Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music

    Get PDF
    In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding difference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the influence of MP3 coding for the Mel frequency cepstral coeficients (MFCCs). The main result is that the widely used subset of the MFCCs is robust at bit rates equal or higher than 128 kbits/s, for the implementations we have investigated. However, for lower bit rates, e.g., 64 kbits/s, the implementation of the Mel filter bank becomes an issue

    On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement

    Full text link
    Audio-visual speech enhancement (AV-SE) is the task of improving speech quality and intelligibility in a noisy environment using audio and visual information from a talker. Recently, deep learning techniques have been adopted to solve the AV-SE task in a supervised manner. In this context, the choice of the target, i.e. the quantity to be estimated, and the objective function, which quantifies the quality of this estimate, to be used for training is critical for the performance. This work is the first that presents an experimental study of a range of different targets and objective functions used to train a deep-learning-based AV-SE system. The results show that the approaches that directly estimate a mask perform the best overall in terms of estimated speech quality and intelligibility, although the model that directly estimates the log magnitude spectrum performs as good in terms of estimated speech quality

    Pruning the vocabulary for better context recognition

    Get PDF
    Language independent `bag-of-words' representations are surprisingly effective for text classification. The representation is high dimensional though, containing many nonconsistent words for text categorization. These non-consistent words result in reduced generalization performance of subsequent classifiers, e.g., from ill-posed principal component transformations. In this communication our aim is to study the effect of reducing the least relevant words from the bagof -words representation. We consider a new approach, using neural network based sensitivity maps and information gain for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Reducing the bag-of-words vocabularies with 90%-98%, we find consistent classification improvement using two mid size data-sets. We also study the applicability of information gain and sensitivity maps for automated keyword generation
    corecore